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ABSTRACT  (UNCLASSIFIED) 

-  The  Brain-State-in-a-Box  (BSB)  neural  network  is  an  auto-associative  network.  It  is  able  to 
associate  vectors  with  other  vectors.  Auto-association  results  in  a  system  that  has  been  learned 
to  respond  strongly  to  each  input  vector  ot  a  certain  set  of  input  vectors  by  returning  this  same 


vector  multiplied  by  its  recollection  strength,  where  vectors  are  bounded  by  the  box.  This  gives 


the  possibility  for  convergence  to  one  vector  of  the  learned  set  of  vectors  if  the  actual  input  vector 
is  not  known  to  the  system.  When  radar  signals  are  translated  into  vectors,  the  BSB  network  can 


be  used  for  radar  pulse  classification.  The  whole  process  of  classification  together  with  the 


relation  with  other  non-neural  algorithms  is  shown. 

/O  /  ,  :  ■ -  - 


,s 


) 


iW  > 


c.  t"  : 

L\ 

I  Ji'  .  : 

t 

-  4r| 

l  □ 

.Mo:i - _< 

k 

|  Dirt-i- 

ti-n/ 

Ava :  .1  • 

’■ :  J.  Uy  Codes 

1 

•1  '»nd/or 

Dist  ; 

| 

"r  sclal 

Ad\ 

1 

TNO  report 


Page 

2 


rapport  no, 

:  FEL-90-B023 

titel 

BSB  Radar  pulse  classification 

auteurs 

P.P.  Meiler,  A.M.  van  Wezenbeek 

instituut 

Fysisch  en  Elektronisch  Laboratorium  TNO 

datum 

:  Augustus  1990 

no.  in  iwp  '90 

:  708 

SAMENVATTING  (ONGERUBRICEERD) 

Hei  Brain-State-in-a-Box  (BSB)  neurale  netwerk  is  een  auto-associatief  netwerk.  Het  associeeri 
vectoren  met  vectoren.  De  auto-associatie  zorgt  ervoor  dat  een  systeem  wordt  opgebouwd  dat 
geleerd  is  om  sterk  te  re ageren  op  een  verzameling  van  invoer  vectoren.  Dit  gebeurt  door  bij 
invoer  van  een  vector  uit  deze  verzameling  als  uitvoer  eenzeffde  vector  te  genereren, 
vermenigvuldigd  met  een  waarde  die  overeenkomt  met  de  hevigheid  van  de  reactie.  waarbij  de 
vector  gedwongen  wordt  binnen  de  box  te  btijven.  Dit  biedt  de  mogelijkheid  van  convergence  naar 
66n  van  de  geleerde  vectoren,  ook  als  de  actuele  invoer  vector  niet  bekend  is  aan  het  systeem. 
Wanneer  nu  radar  signalen  naar  vectoren  wonden  vertaald,  kan  het  BSB  netwerk  gebruikt  worden 
voor  radar  puls  classificatie.  Het  gehele  classificatie  process  en  de  reiatie  met  andere,  niet- 
neurale,  algoritmen  wordt  beschreven. 
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1  INTRODUCTION 

From  the  period  of  1-12-1988  to  30-6-1989  the  author  has  been  involved  with  the  study  and 
implementation  of  the  Brain-State-in-a-Box  (BSB)  neural  network.  This  report  presents  both  the 
theoretical  issues  and  the  results  of  practical  experiments  done  in  this  period.  The  reader  should 
be  familiar  with  the  basics  of  neural  computing  (see  [D.E.  Rumelhart,  J.L.  McClelland  1987]  for  an 
introduction). 

Anderson  showed  on  the  conference  of  the  INNS  in  Boston.  1988  that  the  BSB  network  can  be 
used  for  radar  signal  categorisation  [J.A.  Anderson  1988].  This  has  been  the  direct  motive  for 
TNO  to  investigate  its  possibilities.  Related  articles  can  be  found  in:  [J.A.  Anderson  1977],  [J.A. 
Anderson  1983],  [J.A.  Anderson,  M.C.  Mozer  1981],  [J.A.  Anderson,  G.E.  Hinton  1981], 

Chapter  2  starts  with  a  general  introduction  in  descent  techniques,  of  which  the  BSB  network  in 
fact  implements  the  most  simple  one.  This  is  shown  in  chapter  3.  In  chapter  4  it  is  shown  how  the 
BSB  network  can  be  used  for  the  classification  of  clusters.  Conclusions  are  given  in  chapter  5. 
Two  appendices  are  included.  Appendix  A  discusses  those  topics  from  linear  algebra  used  in  this 
report.  Appendix  B  contains  the  documentation  of  C-sources  used  for  the  implementation  of  the 
BSB  network. 
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2  DESCENT  TECHNIQUES 

This  chapter  contains  a  summary  of  descent  techniques.  All  derivations  can  be  found  in 
(S.L.S.  Jacoby,  J.S.  Kowalik,  J.T.  Pizzo  1972]  or  [H.W.  Sorenson  1980].  Descent  techniques 
consist  of  three  parts.  Given  an  initial  vector  pointing  in  N-dimensional  space  (having 
N elements),  a  descent  directions^  (with  ls^l-1)  must  be  found,  which  is  a  vector  pointing 
towards  a  direction  evaluating  to  a  lower  value  of  an  objective  function  E.  This  can  also  be 
interpreted  as  finding  a  point  with  less  energy.  Second,  a  descent  step  length  must  be  found, 
indicating  the  length  of  the  step  taken  in  the  direction  determined  by  the  s^  vector.  Finally  the 
descent  step  must  be  made  according  to  formula  2.1  so  that  E^+i)  <  E(xk) 

xk+l  "  Xk+Yksk  (2.D 

The  vector  s^  is  a  descent  direction  if: 

E  (xk+ysk)  -E  (xk) 
lim  - 

Y->0  Y 

dE^k+TSic)  (2  2) 

dy 

-  skTVE(xk)  <  0 

Notice  that  the  product  of  descent  direction  and  the  gradient  gives  the  directional  derivative.  The 
condition  for  formula  2.2  to  be  negative  implies  that  the  descent  direction  must  make  an  angle  a 
with  the  gradient,  where  n/2  <  a  <  3rc/2,  and  that  the  descent  direction  is  therefore  opposite  to  the 
gradient. 

The  gradient  descent  techniques  have  in  common  that  the  first  or  second  order  derivative  of  the 
objective  function  is  used  to  find  its  minimum.  For  some  applications  it  is  impossible  to  determine 
the  gradient.  The  conjugate  descent  technique  offers  an  algorithm  which  in  principle  avoids  the 
calculation  of  the  gradient. 


2.1  Gradient  descent 

For  the  gradient  descent  the  locally  steepest  direction  is  chosen  as  descent  direction.  The 
following  derivation  shows  that  the  choice  of  locally  steepest  direction  leads  to  a  descent  direction 
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given  by  the  multiplication  of  the  inverse  of  a  metric  matrix  and  the  gradient  vector.  First  define 
the  distance  d  between  x1  and  x2: 

d(x1,x2)  -  ( (x1-x2)TA(x1-x2) ) 1/2  (2.3) 

A  is  a  positive  definite  (d>0,  see  appendix  A)  NxN  symmetric  metric  matrix.  So  all  points  x  at  a 
distance  8  from  xk  are  on  a  N-dimensional  ellipse  (see  section  3.1  for  an  explanation  of  the 
ellipse). 

52  -  (x-xk) TA(x-xk)  (2.4) 

From  a  point  xk  we  take  a  step  Axk  «  so  that  the  resulting  vector  is  on  the  ellipse,  and  we 
choose  this  Axk  so  that  the  associated  energy  E  will  be  minimized: 

min {E (xk+Axk)  I  (Axk)  TAAxk  =  82  )  (2.5) 

x 

We  can  approximate  E(xk)  with  the  first  order  Taylor  series  expansion  about  xk: 

E(xkfAxk)  *  E(xk)  +  (Axk)TVE(xk)  (2.6) 

To  minimize  E(xk+Axk),  we  now  must  minimize  (Axk)TVE(xk)  because  E(xk)  is  fixed.  So: 
mini  (Axk)TVE(xk)  !  <Axk)  TAAxk  «  82l  (2.7) 

One  of  the  possibilities  to  find  this  minimum  is  to  use  the  Lagrange  function: 

m 

L(x,u)  =  E(x)-£  u^g^fx)  (2.8) 

j=l  3  3 

In  formula  2.8  u  « (u-|  u2 ..  um)T  is  a  vector  of  Lagrange  multipliers  corresponding  to  the 
constrained  minimization  problem: 

min{E(x)  lgi(x)  ^  0,  j  -  1,2.  .m)  (2.9) 

x  J 

The  primal  function  Lp  is  defined  as: 

Ld(x)  -  max  L(x,u)  (2.10) 

F  u^O 

and  the  primal  problem  is  the  problem  that  we  want  to  solve: 

min  Lp  (x)  ■  min  E(x) 
x  K  x 


(2.11) 
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When,  as  in  our  case,  (he  original  problem  is  of  the  form  min{E(x)  lgj(x)  -  0,  j  =  1..m),  the  solution 
can  be  found  by  solving  the  classical  saddle-point  conditions: 

V  L(x  u,)  -  o 

x  s'  s  (2.12) 

VuL(xs,us)  ■=  0 

with  L(xs,u)  5  L(x,u)  <  Lfx.Ug)  for  all  U20  and  all  x,  defining  the  saddle-point  (xs,us).  So: 


L(x,u)  *  <Axk) ^VE (xk) -u ( (Axk) ^AAxk-82)  (2.13) 

8l 

—  =  VE(xk) -2uAAxk  =  0  (2.14) 

8x 

8l 

—  »  (Axk) TAAxk-82  =  0  (2.15) 

8u 

giving:  (formula  2.14  rewritten) 

Axk  -  (u/2)A"1VE(xk)  (2.16) 

Given  the  condition  from  formula  2.2  for  descent  directions,  the  direction  of  sk  has  been  found: 

sk  =  -A~1VE(xk)  (2.17) 


2.1.1  First  order 

For  the  first  order  gradient  descent  for  sk,  given  by  formula  2.17,  we  choose  A  « I.  This  results  in: 

sk  -  -VE(xk)  (2.18) 

The  choice  for  ^  is  done  such  that  E(xk)-E(xk+i)  will  be  maximized.  Calculating  the  Taylor 
expansion  of  E(x)  about  xk+1: 

E(xk+i)  -  E(xknksk)  (2  - 19> 

-  E  (xk)  +ykskTVE  (xk)  +  (yk2skTHess  (xk)  sk)  / 2 

-  E  (xk)  -ykVE  (xk)  TVE(xk)  +  (yk2VE  (xk)  THess  (xk)  7e  (xk) )  /2 
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We  require  that: 


dE(xk+i)-E(xk) 

- — - -  -  0  (2.20) 

dr 

VE(xk)TVE(xk) 

r  = - =-S - = -  (2.2i) 

Ve (Xk) THess (xk) VE (Xk) 

The  first  order  gradient  descent  in  fact  generates  directions  that  asymptotically  converge  to  just 
two  directions.  This  has  the  consequence  that  the  rate  of  convergence,  particularly  in  the  vicinity 
of  local  minima,  is  very  slow.  As  a  result  this  method  is  generally  regarded  as  exhibiting 
unsatisfactory  performance,  especially  in  the  vicinity  of  the  local  minimum  1 . 


2.1.2  Second  order 

The  second  order  gradient  descent  technique  uses  the  Hessian  matrix  Hess(xk)  as  metric  matrix, 
so  formula  2.17  gives: 

sk  =  -Hess (xk) _1V  (xk)  (2.22) 

which  results  in  descent  directions  provided  Hess(x)  is  positive  definite.  If  we  assume  a  quadratic 
cost  function  E  «  a+bTx+(xTAx)/2,  then  a  necessary  and  sufficient  condition  for  the  minimum  is 
VE(x)  =  0  (Hess(x)  >  0),  giving  x  =  A^b  2.  When  y=1  then: 

X1  =  Xg-HeS3 (Xg) _1VE (Xg) 

=  Xg-A'* (b+AXg)  (2.23) 

=  -A'^b 

and  thus  the  minimum  can  be  found  in  one  step.  A  clear  disadvantage  of  this  method  is  the 
calculation  of  the  Hessian  matrix  and  its  inverse. 


2.2  Conjugate  descent 

Two  vectors  x-|  and  x 2  are  said  to  be  mutually  conjugate  with  respect  to  a  symmetric  positive 
definite  matrix  A,  if  x1"rAx2  »  0.  In  section  2.1  (formula  2.22)  it  was  necessary  to  calculate  the 


1  Only  when  the  minimum  corresponds  to  a  hole  in  the  energy  surface. 

2  This  also  shows  that  the  problem  of  solving  linear  equations  is  equivalent  to  the  problem 
of  minimizing  a  quadratic  function. 
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inverse  of  the  Hessian.  The  conjugate  methods  are  based  upon  the  fact  that  the  inverse  of  a 
matrix  A  can  be  found  by  using  n  mutually  conjugate  vectors: 


i=l  x^tAx^ 


(2.24) 


This  is  proven  by  showing  that  AA'1  =  I. 


2.2.1  Directional 

This  method  uses  a  set  of  descent  directions,  and  uses  this  set  to  minimize  the  objective  function 
without  calculating  a  derivative.  Initially  they  are  chosen  linearly  independent.  For  each  descent 
direction  the  optimal  is  found  by  minimizing  Efx^+y^s^).  After  the  last  descent  step  from  the  set 
has  been  computed  and  taken,  one  mutually  conjugate  vector  is  computed  and  is  added  to  the 
set  of  descent  directions,  while  the  first  direction  is  deleted  from  the  set.  So  after  n  times  updating 
the  set,  the  set  will  consist  of  n  mutually  conjugate  directions. 


2.2.2  Gradient 

The  conjugate  gradient  technique  uses  the  gradient  for  iteratively  finding  the  set  of  mutually 
conjugate  vectors. 
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3  BSB  NEURAL  NETWORK 

In  this  chapter  the  various  properties  of  the  BSB  recall  formula  are  shown.  This  recall  formula  is 
defined  as  follows:  (where  (xmjn,xmax),  with  (xmjn  <  xmax),  is  the  vector  element  space  for  the 
bounding  hypercube) 

xk+l  =  S(xk+yWx)c)  (y  >  0)  (3.1) 


[  ^ax 

x  > 

^Snax 

S  (x)  =  \ 

1 

X 

^min  <  x  < 

^ax 

(3.2) 

^  ^min 

x  < 

^Snin 

The  energy  function  is  defined  by: 

E(x)  =  ~(xtWx)/2  (3.3) 

E(x)  is  fixed  by  the  learning  rule  and  forms  an  n-dimensional  ellipse.  Furthermore,  each  new 
vector  xk+i  represents  a  point  on  the  energy  surface  which  has  less  energy  than  vector  xk.  This 
new  vector  is  found  by  adding  a  vector  Ax«Wx,  which  always  is  orthogonal  to  the  energy  surface, 
to  the  old  vector.  Finally  it  is  shown  that  when  the  Hebbian  learning  rule  is  used,  the  resulting 
weight  matrix  is  positive  semi-definite. 


3.1  Energy  curves 

When  diagonalizing  the  weight  matrix  W  used  in  the  energy  function  E,  it  is  easy  to  show  that  E 
forms  an  ellipse.  Look  at  the  isoclinals  of  E,  which  are  a  set  of  points  x  all  having  the  same 
energy  c-( : 

E(x>  -=  CX 

- (xTwx) n  =  cx 
xTWx  =  C2 

xtSAS“^x  -  Cj  (3.4) 

xtQAQtx  -  C2 
yTAy  -  c2 

*iyi2+*2y22+--+xnyn2  “  c 

y12/(l/Va.1)2+y22/<l/VX,2)2+..+yn2/(l/VAT1)2  -  c 
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Each  E(x)=c-|  forms  a  N  dimensional  ellipse.  The  length  of  the  i  axis  is  2Vc  /  Also  each  axis 
points  towards  eigenvector  Xj.  When  W  is  positive  semi-definite  E(x)  has  its  maximum  at  VE(x)=0, 
or  E(x)=0  for  x=0.  Notice  that  this  is  a  reversed  energy  function  from  the  ones  studied  in 
chapter  2.  This  also  implies  that  the  minima  of  E  are  on  an  ellipse  1  which  is  most  far  away  from 
x=0,  and  therefore  passes  through  the  endpoints  of  the  hypercube  formed  by  formula  3.2.  Now 
we  can  understand  why  the  BSB  model  works.  During  learning  vectors  are  used  which  represent 
the  endpoints  of  the  hypercube.  These  vectors  are  becoming  eigenvectors  2  of  the  weightmatrix. 
During  recall  these  eigenvectors  determine  the  form  of  the  ellipses,  which  therefore  are  directed 
towards  the  minima  of  the  energy  function. 


3.2  BSB  implements  first  order  gradient  descent  minimizing  energy 

The  following  proof  comes  from  [R.M.  Golden  1986].  It  shows  that  the  energy  associated  with 
xk+1  is  less  than  the  energy  associated  with  xk.  This  implies  that  subsequent  invocations  will  end 
up  in  a  vector  which  has  in  its  local  environment  minimal  energy.  Because  during  learning  these 
minima  are  associated  with  prototypes,  we  may  expect  to  converge  to  the  same  prototypes  when 
we  start  within  an  environment  of  each  prototype.  The  gradient  of  E  can  be  calculated  (see 
appendix  A),  and  denoted  as  g^ 

VE(xk)  «  gk  -  -Wxk  (3.5) 

and  so,  according  to  formula  2.18  with  s^  =  Wxk,  the  BSB  recall  formula  implements  a  first  order 
gradient  descent.  We  want  to  investigate  the  difference  of  energy  A  between  two  vectors,  and  use 
a  difference  vector  dk: 

dk  =  xk+l_xk  <3-6> 

A  -  2E(xk+1)-2E(xk)  (3.7) 


1 

2 


So  the  minimum  does  not  form  a  hole.  Instead  the  maximum  forms  a  hill. 

This  depends  on  the  learning  rule.  With  Hebbian  learning  the  learned  vectors  will  only  be 
eigenvectors  if  they  are  orthogonal. 
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The  following  derivation  shows  that  A<0,  thereby  proving  that  BSB  tries  to  minimize  energy 
associated  with  vectors: 

A  =  2E (xk+1) -2E (xk) 

-  2E(dk+xk)  -2E(xk) 

=  -  (dk+xk) TW(dk+xk) +xkTWyk 

=  -dk^Wdk-dk'^Wxk-xk'^Wdk  (3.8) 

-  -dkTwdk-2dkTWxk 

■  -dkTwdk+2dkTgk 
So  A<0  if  dkTgk  <  (dkTWdk) /2 

First  look  at  dkTgk.  Introduce  a  (unction  a(i,k)  to  be  able  to  rew'ite  formula  3. i  as  a  linear 
combination  of  each  vector  element: 

xk  +  1(i)  =  xk  (i) --ya(i,  k)gk  <i)  (3.9) 

Consider  the  three  regions  for  x  from  formula  3  2  and  determine  the  range  for  a(i,k).  The  first 
region: 

Xmin  s  xk<i>-Y9k<i>  S  x^*  ->  Oi(i,k)  =  l  (3.10) 

The  second  region: 

Xk(i)-ygkU)  > 

xmax'xk^*  <  'Tgk<i)  1 

f  gkU)  <  o 
xk<V  <  xmax  > 

Xk+l*1*  =  Xjnax  “  xk(i)-ya(i,k)gk(i)  (3.11) 

Xmax-Xk'1* 

«(i#k)  =  - 

->  0  <  a(i,k)  <  1 


TNO  report 


The  third  region: 

xk<i)-*Jk<i)  < 

xmin_xk(:'-)  >  -Ygk(i)  1 

f  gk(i)  >  0 
xk(1>  >  ’Snin  J 

xk+l  (i)  *  Xmin  =  xk  (i)  -ya (i,  k) gk  (i) 


a(i,k) 

->  0  < 


xmin~xk 

-ygk(i) 

d(i,k)  <  1 


(3.12) 
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Combining  the  three  regions  tor  ot(i.k)  leads  to: 
o  £  a<i,k)  <  i 

Now  determine  dk(i)  and  the  inproduct: 


(3.13) 


dk(i)  =  ~7a<i,k)gkU) 

dkTgk  -  -I(1B(i,k)gk(i)2)  <  0  t3'14> 

And  look  at  (dkTWdk)/2: 

Xroinlxl2  *  xT«x  *  \nax  I x  I  2 

<Vinldkl2)/2  S  (dkTWdk)/2  (3-15> 

So.  when  Xmjn  >  0:  dkTgk  <  (dkTWdk)/2.  Else,  when  kmjn  <  0,  just  calculate  the  condition  for 
which  A  <  0: 


dkT9k  <  <dkTWdk)/2 

-Zcya(i,k)gk(i)2)  <  (Xminj;(-i«(i,k)gk(i))2)/2  (3.16) 

Y  <  2(Ia<irk)gk(i)2)/(|>iIIlin|j;(o(i,k)gk(i))2) 
with  a(i,  k)  2  S  a(i,k)  (formula  3.13) 

r  <  2/l^inl 

So  either  when  W  is  positive  semi-definite,  or  y  <  2/ 1  Win  I  •  BSB  will  minimize  an  energy 
function. 
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3.3  Gradient  vector  orthogonal  to  energy  field 

Choose  a  vector  a,  with  E(a)=c.  Also  place  on  E  a  curve  f(t) «  (x1(t),x2(t),..,xn(t)),  so  that  for  t=to 
f(t0)=a  holds.  This  leads  to  the  following  equality: 

E  (f  (t )  )  =  c  Vt 

dEfx-,  (t)  ,  x2  (t) ,  . .  ,x_  (t) ) 

- - - - - - -  -  0  (3.17) 

dt 

Ve  (a)  t£  '  (t0)  =  0 

So  the  gradient  vector  in  a  is  orthogonal  to  every  derivative  of  f  in  a,  and  so  must  be  orthogonal  to 
the  curve  E(x)=c  through  a  (derivation  from  (J.H.J.  Almering  1983}). 


3.4  Hebbian  weightmatrix  is  positive  semi-definite 

We  want  to  show  that  matrix  W  formed  by  Hebbian  learning  is  positive  semi-definite  (>-min  -  °) 
All  that  has  to  be  done  is  to  show  that  xTWx  2  0  Vx  (see  appendix  A): 

W  -  x1x1t+X2X2T+ . . +xnxnT 
xjlwxj  -  X2T(x1x1T+X2X2T+. .+xnxnT)X2 

“  xJT(x1TxJx1+x2Txix2+. .+xnTxixn) 

-  x1TxJxJTx1+x2'rxJxJTx2+.  ,+xnTxixiTxn 

-  (xITx1)Tx1Tx1-MxiTx2)TxiTx2  +  .  .  +  (xiTxn)TxiTxn 
“  lxJTx1l2+lxiTx2l2+..  +  lxi'Ixnl2  2  0 


(3.18) 
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4  BSB  RADAR  PULSE  CLASSIFICATION 

A  radar  receiver  converts  the  received  radar  spectrum  to  a  number  of  features  or  fields  for  each 
detected  radar.  For  example  the  frequency  field,  the  level  field  and  the  bearing  field.  The  fields 
may  be  distorted  with  noise.  We  want  two  problems  to  be  solved.  First,  a  correct  removal  of  noise, 
and  second,  an  indication  of  the  number  of  radars  in  the  neighbourhood. 

The  radar  features  are  scalars  (no  dimensionality).  A  BSB  network  processes  vectors.  So  a 
transformation  from  scalars  to  vectors  is  necessary.  Two  transformations  will  be  discussed.  Also 
results  of  experiments  are  presented  and  explained  as  much  as  is  possible. 


4.1  Translation 

A  translation  translates  (converts)  a  scalar  to  a  vector  and  vice-versa.  Two  translation  methods 
will  be  discussed:  thermometer  coding  and  reflected  binary  coding. 


4.1.1  Thermometer  coding 

A  thermometer  is  formed  by  a  vector.  The  indicator  of  the  thermometer  is  a  block  of  vector 
elements  which  are  ON  (have  a  high  value).  All  other  elements  are  OFF  (have  a  low  value).  When 
the  scalar  has  a  low  value,  the  indicator  of  the  thermometer  is  set  low  (at  the  beginning,  which  is 
the  left  side,  of  the  vector).  Gradually,  as  the  scalar  increases,  the  indicator  is  set  higher  (the 
block  of  ON  vector  elements  shifts  from  the  beginning  of  the  vector  to  the  end  of  the  vector). 
Extremes  of  the  scalar  need  a  special  treatment,  because  then  the  width  of  the  indicator  should 
decrease.  A  disadvantage  of  this  thermometer  coding  is  that  it  only  uses  n  code  vectors  of  the  2n 
possible  vectors  in  the  2n  hypercube.  A  reason  for  choosing  this  code  is  that  it  provides  a  simple 
manner  to  map  scalars  with  slightly  differing  values  to  vectors  which  are  close  to  each  other  on 
the  hypercube  (the  thermometer  indication  of  these  vectors  is  almost  the  same,  and  so  their 
hamming  distance  is  low,  and  so  they  have  few  differing  axis  values).  The  conversion  formula,  a 
Discussion  on  a  choice  for  the  thermometer  width,  and  the  choice  for  vector  elements  are 
presented. 
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4.1.1 .1  Conversion  formula 

For  conversion  from  scalar  to  vector  the  following  formula  is  used:  (with  s  scalar,  smax  the 
maximal  field  value,  smjn  the  minimal  field  value,  n  the  number  of  ON  vector  elements  for  that 
field). 

mid-  < (s-Smin) '  <smax-3min> > *n  (4.1) 

Test  whether  there  is  space  at  the  vector  beginning:  (with  corrlow  correction  number  for  a  low 
indication,  w  width  of  the  block  of  ON  elements  and  set  to  a  constant  value) 


if  (mid  <  w/2) 

corrlow  =  (w/2) -mid 


(4.2) 


Test  whether  there  is  space  at  the  vector  end  to  add  w/2  elements  to  mid:  (with  corrhigh 
correction  number  for  a  high  indication) 


if  (mid  >  n-w/2) 

corrhigh  =  mid+(w/2)-n 


(4.3) 


Now  mid  is  used  as  index  to  the  x  vector,  around  mid  a  total  Of  w  values  are  set  to  Xfp^,  unless 
mid  is  at  a  border:  (with  xmin  the  minimal  value  of  a  vector  element,  xmax  the  maximal  value  of  a 
vector  element,  fi  the  first  index  which  must  be  set  in  the  vector  to  xmax) 


fi  -  mid+corrlow-w/2 

for  (i  -  0;  i  <  fi;  i++)  x[i]  -  x_in 

(4.4) 

for  (i  =  fi;  i  <  f i+w-corrlow-corrhigh;  i++)  x[i]  -  xmax 
for  (i  -  f i+w-corrlow-corrhigh;  i  <  n;  i++)  x[i]  -  xm^n 
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As  an  example  of  a  thermometer  coding  look  at  table  4.1. 


0 

1 

1 

-1 

-1 

-1 

-1 

-1 

-1 

-1  -1 

-1 

1.0 

1 

1 

1 

1 

-1 

-1 

“1 

-1 

-1 

-1  -1 

-1 

1.5 

2 

-1 

1 

1 

1 

-1 

-1 

-1 

-1 

-1  -1 

-1 

2.5 

3 

-1  -1 

1 

1 

1 

“1 

-1 

-1 

-1  -1 

-1 

3.5 

4 

-1  -1 

“I 

1 

1 

1 

-1 

-1 

-1  -1 

-1 

4.5 

5 

-1  -1 

-1 

-1 

1 

1 

1 

-1 

-1  -1 

-1 

5.5 

e 

-1  -1 

-1 

-1 

-1 

1 

1 

1 

-1  “I 

-1 

6.5 

7 

-1  -1 

-1 

-1 

-1 

-1 

1 

1 

1  -1 

-1 

7.5 

8 

-1  -1 

-1 

-1 

-1 

-1 

-1 

1 

1 

1 

-1 

8 .  S 

9 

-1  -1 

-1 

-1 

-1 

“I 

-1 

-1 

1 

1 

1 

9.5 

10 

-1  -1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

1 

1 

10.0 

Table  4.1  Thermometer  coding  with  smin=0,  smax=1 1 ,  x^^-l ,  xmax=1 ,  w=3,  n=l  1 .  The 
first  column  gives  scalar  input,  the  second  column  vector  values,  the  last  column 
scalar  output. 


For  the  conversion  from  vector  to  scalar  the  following  formula  is  used:  (with  val  value  of  first 
element  of  block  of  thermometer  indication,  w  the  number  of  ON  vector  elements  the  block 
consists  of,  smax  the  maximal  field  value,  smjn  the  minimal  field  value,  n  the  number  of  vector 
elements  for  that  field) 

3  -  smin+<val+"/2>*<3max-smin>/n  (4.5) 


4.1.1 .2  Choice  of  thermometer  width 

The  choice  of  thermometer  width  used  with  the  thermometer  coding  has  consequences  for  the 
separation  of  vectors  during  recalling.  It  we  use  Hebbian  learning,  only  orthogonal  vectors  can  be 
recalled  correctly: 

w  “  xixiT+x2x2T+- .+XjXjT+. . +xnxnT 
Wxi  “  xixiTxj+X2x2TxJ+ • .+xjXjTxj+. .+xnxnTXj 
"  xlTxixl+x2'rxix2+-  •+XjTxixj+.  ,+xnTxixn 
-  1x2 Ixl  provided  x^TXj  ■  0  if  i  r  j 

When  vectors  are  formed  according  to  the  (-1,1)  thermometer  coding,  they  generally  will  not  be 
orthogonal.  This  is  not  as  bad  as  it  may  seem,  because  thanks  to  gradient  descent  recalling,  non- 
orthogonal  vectors  can  be  separated  too.  In  fact  the  setting  of  w  determines  the  level  of 
orthogonality  between  input  vectors.  For  example  consider  two  vectors  y1  and  y2  consisting  each 
of  two  fields  of  1 1  elements.  Give  both  fields  of  y i  a  low  thermometer  coding  and  both  fields  of  y2 
a  high  coding,  and  calculate  the  Inproduct  of  yiTy2,  varying  the  value  of  w: 
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When  w-1 : 

yi  -  (-1  l  -l  -l  -l  -l  -l  -1  -l  -l  -1  -l  l  -l  -l  -l  -l  -l  -l  -l  -l  -i)T 

y2  -  (-1  -1  -l  -l  -1  -l  -l  -1  -l  l  -1  -1  -l  -1  -1  -1  -l  -l  -l  -l  l  -i)T 

the  inproduct  yiTy2  “  14- 

When  w»°3 : 

yi  -  (  l  l  l  -l  -l  -l  -l  -l  -l-i-i  i  l  l  -i  -l  -l  -l  -l  -l  -l  -1>T 
y2  -  (-1  -l  -l  -1  -l  -l-l-l  l  l  l  -l  -l  -l  -l  -l  -l  -l-l  l  l  l)T 
the  inproduct  y^Ty2  =  -2. 

When  w=4 : 

yj.  -  t  i  i  i  i  -l  -l  -l  -l  -l  -l  -l  i  i  l  l-i-i  -l  -l  -i  -l  -uT 

y2  -  (-1  -1  -1  -1  -1  -1  -1  1  1  1  1  -1  -1  -1  -1  -1-1-1  l  1  l  1)T 

the  inproduct  yiTy2  ”  -10. 

Varying  w  results  in  varying  the  level  of  orthogonality.  When  we  demand  that  non-correiated 
vectors  are  orthogonal,  we  can  derive  the  optimal  value  ot  w: 

yi  "  (lj..lw  ~lw+l  ••  _^nl^ 

y2  *  *“*1  ••  "*n-w  ^n-w+1  ••  ^n'T 

y1Ty2  *  (-w) + (n-w+1- (w+1) ) + (-w)  (4.7) 

-  n-4w 
w  -  n/4 

This  choice  of  w  leads  to  a  maximum  of  only  4  orthogonal  vectors,  independent  of  n. 

4.1. 1.3  Choice  for  value  of  vector  elements 

In  principle  two  codings  are  possible:  (0,1 )  coding  and  (-1,1)  coding.  It  is  shown  that  the  (0,1) 
coding  results  in  W  matrices  which  are  band  matrices  and  so  prevent  some  prototype  recollection 
(there  is  no  information  outside  the  bands),  resulting  in  the  choice  for  (-1,1)  coding. 
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Consider  a  vector  y  with  thermometer  width  w»3  consisting  of  10  elements  (0,1)  ooded  with  a 
block  of  ones  starting  at  index  4,  y  -  (0  0  0  1  1  1  0  0  0)T,  and  look  at  W: 


w 


1  1  1  0  0  0I1 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

By  varying  for  different  ys  the  value  of  the  start  of  indication  but  keeping  w~3,  the  block  of  ones 
in  W  shifts  across  the  diagonal,  and  so  the  next  W  results:  (each  x  denotes  a  matrix  element 
which  may  have  a  value  *  0) 


w 


xxx000000 

xxxxOOOOO 

xxxxxOOOO 

OxxxxxOOO 

OOxxxxxOO 

OOOxxxxxO 

OOOOxxxxx 

OOOOOxxxx 

OOOOOOxxx 


Now  take  a  vector  ys  of  n  elements  and  width  w:  (with  s  the  index  of  the  first  ON  bit) 

y3  =  <0X  ..  03-i  *  *  ^s+w-l  ®s+w  (4.8) 

Matrix  W  is  formed  by  summing  ysysT  for  1  s  s  s  n.  Look  at  Wys  (choose  one  s).  The  most 
extreme  elements  of  the  block  (1s  and  1s+w.i)  can  activate  a  subblock  in  Wys: 

Wy3  -  < 0 1  ••  O3-W  *s-w+l  ••  ^s+Zw-Z  ®s+2w-l  ®n*T  (4.9) 

and  Wys  is  always  limited  to  a  band  of  3w-2. 


4.1.2  Reflected  binary  coding 

The  thermometer  code  only  uses  n  nodes  of  the  2n  hypercube  as  possible  code  vectors.  The 
reflected  binary  code  (also  called  Gray  code)  uses  each  node  (A.P.  Thijssen,  H.A.  Vink,  C.H. 
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Eversdijk  1982].  Also  scalar  values  which  differ  slightly  are  translated  into  vectors  which  have  a 
low  hamming  distance. 

Using  the  thermometer  code,  the  hamming  distance  between  two  vectors,  each  representing  a 
scalar,  increases  linearly  wilh  the  increase  of  the  difference  between  the  scalars.  Using  reflected 
binary  coding  the  hamming  distance  between  two  vectors  representing  two  differing  scalars  can 
also  increase  if  the  difference  between  the  scalars  increases,  but  this  does  not  happen  linearly. 
The  distance  between  the  vectors  representing  scalars  0  and  1,  d(0,1)  is  1,  as  can  be  seen  in 
table  4.2.  It  can  also  be  seen  that: 

d(0, 1)  =  d(0,3) 

d (0, 2)  -  d (0,  4 )  (4.10) 

d (0, 2 )  >  d  (0/  3) 

4.1 .2.1  Conversion  formula 

For  conversion  from  scalar  to  vector  the  following  formula  is  used:  (with  0  <  s  <  2n  scalar, 
r  intermediate  number,  x  «  xn_i  ..  xq  vector) 

r  -  3  mod  (2 ■*■ )  (i  «  2,3,  ,.,n+l) 

if  <  r  <  zi‘2>  xi-2  ’  xmin  ..  ... 

i  n  •  n  ■  <  (4.11) 

if  (21*2  <  r  <  2i-2+2i_1)  Xi_2  - 

if  (  r  >  2i-2+2i-1>  Xi_2  - 

An  example  of  Gray  coding  is  shown  in  table  4.2. 


0 

-1 

-1 

-1 

-1 

1 

-1 

-1 

-1 

1 

2 

-1 

-1 

1 

1 

3 

-1 

-1 

1 

-1 

4 

-1 

1 

1 

-1 

5 

-1 

1 

1 

1 

6 

-1 

1 

-1 

1 

7 

-1 

1 

-1 

-1 

8 

1 

1 

-1 

-1 

9 

1 

1 

-1 

1 

10 

1 

1 

1 

1 

Table  4.2  Gray  coding  with  Sp^p-0,  8,^-1 1,  xmin— 1,  x^^-l,  n-4  The  first  column 

gives  scalar  input  (is  scalar  output),  the  second  column  gives  vector  values. 
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For  reverse  conversion  the  following  formula  is  used:  (with  x  •  xn.-,  xn.2  ..  x0  the  reflected  binary 
coded  vector,  sn.1  sn.2  ..  sq  the  resulting  binary  coded  vector,  s  the  corresponding  scalar) 


3n-l  “  xn-l 

if  <si+1 

“  *max> 

(i  -  n-2,n-3, 

.  .,0) 

if  (X£ 

=  xmin* 

3i  ”  xmax 

else 

3i  “  xmin 

else  s^  ■ 

=  xi 

b  (s^ )  -  • 

[  1  if  Si 
l  0  if  s^ 

xmax 

xmin 

s  =  b(sn. 

.1)2n-1+b(sn.2)2n_2+  .. 

+b(s0)2° 

A  few  remarks  on  the  teaming  algorithm  which  is  used.  When  hebbian  learning  is  used,  the  input 
patterns  should  be  almost  orthogonal.  This  limits  the  network  to  n  distinguishable  vectors.  When 
delta  learning  is  used,  every  set  of  n  linearly  independent  vectors  can  be  used.  So  although  the 
2n  hypercube  has  2n  comers  and  though  the  reflected  binary  code  uses  each  of  them,  only  n  of 
them  can  actually  be  associated  with  a  prototype  with  the  standard  BSB  model.  An  extension 
might  be  to  develop  a  new  BSB  model  with  hidden  units  and  back  propagation  learning  process 
In  fact  only  n-1  linearly  independent  vectors  can  be  learned  by  an  auto-associative  system, 
because  if  n  vectors  are  learned  so  that  Wxj  =  Xxj  (i  =  1,2, ..  ,n),  you  have  fully  specified  the  set 
of  eigenvectors,  so  the  only  solution  is  W  =  I  (see  also  section  4.2.2). 

4.1 .2.2  Choice  of  value  of  vector  elements 

Again  the  (0,1)  coding  is  not  a  good  coding.  With  the  reflected  binary  code  not  each  vector  has 
the  same  amount  of  ON  vector  elements.  When  a  vector  with  few  ON  vector  elements  is  learned 
and  a  vector  with  many  ON  vector  elements,  the  first  vector  will  get  a  smaller  eigenvalue 
(determined  by  the  length  of  the  input  vector),  and  will  therefore  be  learned  much  weaker. 


4.2  Results  of  experiments 

First  a  simple  2  dimensional  BSB  model  is  used  to  verify  the  theories  developed.  Then  an 
experiment  is  done  which  tests  the  classification  possbilities  of  the  BSB  network. 
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4.2.1  2  dimensional  BSB 

In  this  section  the  experiment  that  Anderson  published  in  [J.A.  Anderson  1983]  is  repeated.  The 
purpose  is  to  study  the  recall  field  of  a  2  dimensional  BSB  model.  This  field  is  a  2  dimensional 
plane,  bounded  by  x^— 1  and  Xp^-^-i  of  the  vector  elements.  The  weightmatrix  is: 


w 


0.035  -0.005 
-0.005  0.035 


In  figure  4.1  this  recall  field  is  shown.  In  table  4.3  the  number  of  steps  needed  tor  convergence  is 
given  for  each  gridpoint.  Because  W>0,  y  can  be  chosen  large  to  reduce  the  number  of  steps. 
See  table  4.4. 


Figure  4.1  Recall  field  for  a  2  dimensional  BSB  model.  In  each  gridpoint  the  gradient  is 
drawn. 

Figure  4.1  also  shows  the  ellipses  E(x)«c  for  different  values  of  c  (c-0,  c— 0.005,  c»-0.0i, .. ).  The 
gradient  vectors  indeed  are  orthogonal  to  these  ellipses. 
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Table  4.3  Number  of  steps  necessary  for  convergence  with  y=l . 

The  values  from  table  4.3  perhaps  will  be  better  understood  when  they  are  put  in  a  landscape 
plot,  where  the  height  of  the  landscape  on  grid  coordinate  (x,y)  is  determined  by  the  number  of 
steps  for  (x,y)  to  converge.  See  figure  4.2. 


Figure  4.2 


Table  4.3  as  a  landscape. 
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Table  4. 

4 

Number  of  steps  necessary  for  convergence  with  y=l000 

4.2.2  Clustering 

For  the  experiments  the  following  parameters  are  used.  The  neural  network  consists  of  22 
completely  linked  neurons.  Each  radar  signal  consists  of  two  fields.  Each  experiment  is  done  four 
times.  First  each  field  is  thermometer  coded  with  a  thermometer  width  of  3  and  both  Hebbian  and 
delta  learning  is  used,  then  each  field  is  Gray  coded  and  both  Flebbian  and  delta  learning  is  used 
Each  field  serves  as  input  for  11  neurons  (w=3,  n=11  for  each  field).  During  recalling  each 
possible  inputvector  in  the  inputrange  is  generated  and  the  stable  outputvector  belonging  to  these 
inputvectors  is  drawn  in  a  two-dimensional  plot.  For  simplicity  both  x  and  y  field  values  are 
between  0  and  11  (so  (0,0)  corresponds  with  the  upper  left  comer,  (10,10)  with  the  lower  right 
corner,  s^x-l  1 ,  smjn-0).  When  delta  learning  is  used,  Lcoef  is  chosen  0.05,  and  each  pattern  is 
learned  5  times,  after  which  randomly  another  pattern  is  learned  5  times.  This  is  repeated  until  all 
patterns  have  been  learned. 

4.2.2. 1  One  radar 

The  radar  learned  is  at  (1,1)  V  Look  at  figure  4.3  for  the  recall  diagram.  In  this  figure  and  in  all 
following  figures  the  arrows  in  each  gridpoint  (x.y)  point  to  the  stable  gridpoint  (v,w)  to  which  (x,y) 
converges  when  formula  3.1  is  applied  recursively. 


1 


Recall  values  will  be  at  (1 .5,  i  .5),  see  table  4.1 . 
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Figure  4.3  Recall  diagram  for  one  radar  consisting  of  two  fields,  learned  at  (1 ,1)  using 
thermometer  coding. 

Points  (.)  in  the  plot  represent  vectors  of  which  the  recall  vector  equals  the  original  vector.  There 
are  two  kinds  of  these  vectors.  First  the  radar  point  (1,1),  which  is  learned  to  be  an  eigenvector, 
and  so  Wx=X,  with  X=22.  Because  only  one  vector  is  learned,  there  will  be  21  eigenvectors  with 
X=0  (X=0  has  multiplicity  21).  This  implies  that  there  exists  a  21  dimensional  plane  on  which 
vectors  are  lying  with  Wx-0.  These  vectors  form  the  second  group  equilibrium  points.  The  energy 
of  these  vectors  is  E(x)  *  -(xTWx)/2  « -(xT0)/2  =  0.  The  diagram  also  shows  another  interesting 
feature.  Although  no  radar  is  learned  at  (7,7),  a  lot  of  arrows  are  pointing  towards  this  point.  Again 
the  explanation  is  simple.  When  Wx  »  Xx,  then  also  W(-x)  =  -Wx  =  -Xx  =  X(-x),  or  when  x  is  an 
eigenvector  then  also  -x  is  an  eigenvector.  We  have  learned  the  following  vectors: 

»  -  (i,D  -  i  i  i  i  -l  -l  -l  -l  -l  -i  -i  -i  i  i  i  -i  -i  -i  -l  -l  -i  -nT 

-x  =  <-l  -1-1  1  1  1  1  1  1  1  l-l-l-l  1  1  1  1  1  1  1)T 

The  radar  corresponding  to  -x  will  be:  (with  formula  4.5) 

rad  -  smin+<val+w/2>*<smax-3min>/n 

-  <3+<8/2) *11/11, 3+<8/2) *11/11)  (4.13) 

-  (7,7) 

Also  notice  that  the  energy  of  eigenvector  -x  equals: 

-  - ( xTWx ) /2 

-  -( <-x)TW(-x) ) /2 

-  E(x) 


E(x) 

E(-x) 


(4.14) 
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This  leads  to  two  remarks,  concerning  the  fact  that  we  donl  want  to  reach  equilibrium  points 
which  are  not  a  result  ot  x  (of  learned  prototypes).  First,  it  is,  tor  this  example,  easy  to  escape 
from  Wx=0  points.  Because  Ex=0,  these  x  points  will  be  located  at  the  top  of  the  energy 
landscape,  and  when  one  element  of  x  is  changed  so  that  E(x)  *  0,  x  will  converge  to  a  learned  x. 
In  fact  this  implements  simulated  annealing  [P.J.M.  Laarhoven,  E.H.L.  Aarts  1988]  (x  is  given  a 
little  push,  so  it  rolls  down  tram  the  top  of  the  hill).  For  the  second  group  of  equilibrium  vectors, 
the  obvious  solution  is  to  invert  the  vector  when  the  recalled  thermometer  width  is  greater  than 
the  expected  thermometer  width. 

Greenberg  [H.J.  Greenberg  1988]  has  studied  equilibria  of  the  BSB  neural  model.  One  result  is 
that  when  W  is  strongly  diagonal-dominant  2,  each  extreme  point  (endpoint  of  hypercube)  is 
stable.  It  can  be  proven  that,  when  starting  in  the  neighbourhood  of  each  extreme  point,  BSB  will 
converge  to  the  extreme  point.  The  other  result  is  that  when  W  is  strongly  diagonal-dominant,  and 
x  is  a  fixed  point  which  is  no  extreme,  then  x  is  not  stable.  This  occurs  when  Wx=0. 

In  figure  4.4  the  recall  field  is  shown  when  Gray  coding  is  used.  Each  input  vector  is  recalled 
perfectly. 
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Figure  4.4  Recall  diagram  for  one  radar  (1 ,1)  consisting  of  two  fields,  using  Gray  coding. 


2 


W  is  strongly  diagonal-dominant  when  each  diagonal  element  is  larger  than  the  sum  ot 
the  rest  of  the  elements  of  the  row  the  diagonal  element  belongs  to. 
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4.2.2.2  Two  radars 

Two  radars  are  learned,  at  (1 ,1)  and  (9,9).  Look  at  figure  4.5  for  the  recall  diagrams. 
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Figure  4.5  Recall  diagram  for  two  radars  (1,1),  (9,9),  using  thermometer  coding  and  Hebbian 
learning  (left)  or  delta  learning  (right). 

As  can  be  expected,  the  region  above  the  diagonal  converges  lo  (1 ,1)  and  the  region  below  the 
diagonal  to  (9,9).  On  the  diagonal  are  equilibrium  points.  These  extra  points  are  (0,10),  (1,9), 

(5.5) ,  (9,1)  and  (10,0),  both  when  Hebbian  and  delta  learning  is  used.  In  each  of  these  points  the 
gradient  vector  has  the  direction:  (This  is  only  checked  for  Hebbian  learning) 

(000111110000001111100  0)T. 

The  gradient  has  a  zero  component  in  the  direction  of  each  prototype.  This  results  in  the  vector: 

(-l-i-iiiiii  -l  -l  -l  -l  -l  -l  -l  l  l  i  l  l  l  -l  -l  -dt, 

or  scalar  (5.5 ,5.5). 

See  figure  4.6  for  the  recall  field  when  translation  is  done  with  Gray  coding.  It  appears  that  (1.9). 

(6.6)  and  (9.1)  are  extra  equilibrium  points  when  Hebbian  learning  is  used. 
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Figure  4.6  Recall  diagram  for  two  radars  (1,1),  (9,9),  using  Gray  coding  and  Hebbian 
learning  (left)  or  delta  learning  (right). 

From  table  4.2  it  can  be  derived  that  for  s=4,  5,  6  and  7  the  corresponding  vector  has  the  same 
Hamming  distance  to  s=1  and  s=9.  When  a  field  has  one  of  these  values,  the  output  is  s=6,  which 
has  the  minimal  Hamming  distance  1  to  S=1  and  s=9.  When  delta  learning  is  used,  the  extra 
equilibrium  points  have  disappeared  and  are  replaced  by  convergence  to  the  radar  (1 ,1). 

4. 2. 2.3  3  radars 

The  third  radar  is  placed  at  (8,8).  Figure  4.7  shows  that  the  diagonal  has  moved  towards  the 
upper  left  comer  both  for  Hebbian  and  delta  learning.  The  gradient  is:  (checked  only  for  Hebbian 
learning) 

(42  42  42  30  30  30  30  -6  -42  -42  -€  42  42  42  30  30  30  30  -6  -42  -42  -6)T 

This  gives  rise  to  the  vector 

(l  l  l  l  l  l  i  -l  -l  -l  -l  l  l  l  l  l  l  l  -l  -l  -l  -i)T 


or  (3.5 , 3.5). 
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Figure  4.7  Recall  diagram  for  3  radars  (1,1),  (9,9),  (8,8),  using  thermometer  coding  and 
Hebbian  learning  (left)  or  delta  teaming  (right). 


Figure  4.8  shows  the  recall  diagram  for  Gray  coding.  Only  one  stable  point  (9,9)  exists  when 
Hebbian  learning  is  used.  Inspection  of  the  weight  matrix  (for  one  field): 
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So  each  Wx  will  have  direction  (  l  -l  1  1  -1  -1  -1  -1  -l  -l  -1) T,  or  scalar  9. 
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Figure  4.8  Recall  diagram  for  3  radars  (1,1),  (9,9),  (8,8),  using  Gray  coding  and  Hebbian 
learning  (left)  or  delta  learning  (right). 

4.2.2  4  4  radars 

Figures  4.9  and  4.10  show  recall  fields  when  4  radars  are  learned.  These  diagrams  correspond 


with  the  diagrams  of  2  learned  radars. 
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Figure  4.9 


Recall  diagram  lor  4  radars  (1 .1).  (1 ,9),  (9,1),  (9,9),  using  thermometer  coding 
and  Hebbian  learning  (left)  or  delta  teaming  (right). 
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Figure  4.10  Recall  diagram  for  4  radars  (1,1),  (1,9),  (9,1),  (9,9).  using  Gray  coding  and 
Hebbian  learning  (left)  or  delta  learning  (right). 

When  Hebbian  learning  is  used,  extra  stable  equilibrium  points  are  formed  on  each  separation 
line  between  two  radars.  With  delta  rule  learning  these  extra  points  also  converge  to  a  learned 
radar 3. 

4.2.2  5  How  many  radars  can  be  learned? 

For  the  last  experiment  it  is  verified  whether  a  network  of  n  neurons  can  learn  n-1  prototypes.  The 
network  used  consisted  of  8  completely  linked  neurons  for  one  field.  Only  delta  rule  learning  is 
used,  and  Gray  coding.  The  following  seven  prototypes  are  learned:  12,  20,  45,  63, 128,  211  and 
233.  Lcoef  is  0.05  and  each  pattern  learned  7  times.  The  following  weight  matrix  is  formed: 
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It  can  be  verified  that  the  seven  prototypes  are  indeed  eigenvectors  of  this  matrix,  with  X=8  for 
each  vector. 


3 


To  which  prototypes  they  are  converging  deserves  further  attention,  because  the  chosen 
prototype  appears  to  be  not  always  a  fair  one. 


Now  prototype  number  eight  (with  a  value  of  254)  is  added  to  the  set,  and  the  whole  set  is  learned 
again  (starting  with  W-0).  The  weight  matrix  which  is  formed:  W-81.  So  the  system  is  not  able  to 
form  the  n,h  eigenvector. 

Looking  again  at  the  weight  matrix  formed  by  learning  the  seven  prototypes,  shows  that  it  forms  a 
strongly  diagonal-dominant  matrix.  This  implies  that  each  hypercube  corner  is  stable.  This  has  the 
consequence  that  the  BSB  recall  will  not  converge  to  one  of  the  prototypes,  but  instead  remain  on 
each  stable  hypercube  corner. 
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5  CONCLUSIONS  AND  FURTHER  RESEARCH 

The  BSB  neural  network  implements  a  first  order  gradient  descent  algorithm,  capable  ol  solving 
nonlinear  optimization  problems  such  as  radar  pulse  classification.  A  first  step  has  been  done 
towards  the  implementation  of  radar  pulse  classification  with  a  BSB  network. 

The  experiments  show  that  the  BSB  network  is  able  to  remove  noise  from  signals,  if  the  original 
signals  are  learned  in  a  separate  learning  phase.  A  network  of  n  neurons  can  learn  n-1  radars 
effectively.  Noisy  input  can  be  recalled  by  the  BSB  network  to  one  of  these  n-1  radars,  if  the 
weight  matrix  is  not  diagonal-dominant. 

When  there  is  no  separate  learning  phase,  learning  and  recalling  can  be  combined.  A  number  of 
radar  data  samples  is  processed,  after  which  the  number  of  stable  comers  indicates  the  number 
of  different  radars  (clusters).  Further  research  is  required  to  study  and  implement  this  possibility 

The  BSB  neural  network  can  also  be  used  in  other  applications.  TNO-FEL  has  studied  and 
implemented  a  Spatio-temporal  Pattern  Recognition  (SPR)  neural  network  to  recognize  garbled 
text  strings  in  messages  [P.P.  Meiler  1990].  This  recognition  can  also  be  done  using  a  BSB  neural 
network.  Each  character  of  the  string  is  represented  by  a  field  using  5  bit  Gray  coding.  This 
approach  may  be  more  efficient  than  the  SPR  network  because  there  is  no  time  sequence 
involved.  However,  the  number  of  iterations  required  for  the  BSB  network  to  converge  to  the 
correct  result  is  as  yet  unknown.  A  comparison  may  give  interesting  results. 


A.  van  Wezenbeek 
(Author) 
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LINEAR  ALGEBRA 


1  Eigenvalues  and  eigenvectors 

Eigenvalues  and  eigenvectors  form  one  of  the  most  important  concepts  of  linear  algebra.  In  this 
section  it  is  shown  how  a  matrix  A  can  be  diagonalized,  thus  revealing  its  eigenvalues  and 
eigenvectors.  The  derivation  comes  from  [G.  Strang  1980]. 


Ax  =  Xx,  X  eigenvalue,  x  eigenvector 
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A  matrix  A  is  called  positive  semi-definite  when  all  eigenvalues  are  greater  than  or  equal  to  zero, 
so  Xmjn  >  0,  or  xtAx  >  0.  When  Xmjn  >  0,  so  xTAx  >  0,  A  is  called  positive  definite. 


2  Linear  operators 

In  this  section  a  summary  is  given  of  elementary  differential  operators  [H.W.  Sorenson  1980] 

The  first  order  derivative  of  f  (written  as  a  row  vector): 

8f  8f  8f  t 

8x  8xj  8xn 

=  Vf 

The  second  order  derivative  of  f,  also  called  Hessian  (or  Hess),  forming  a  (nxn)  matrix: 

8^f  8  Sf 

8x2  8x  8x 

8 

-  —  Vf 
8x 


The  following  derivatives  are  used: 
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The  chain  rule: 

8f (u)  8f  8u 

8x  Su  Sx 

As  an  example  the  chain  rule  can  be  used  to  find  the  derivative  of  f «  xTAx: 

f  =  x^Ax 

g(x,y)  =  xTy,  y  =  Ax 

8f  8g  8g  8y 
8x  8x  8y  8x 
=  yT  +  xtA 
=  xtAt  +  xtA 
-  xt(A  +  At) 

if  A  «  At,  then  Vf (x)  -  2Ax. 

Second  order  Taylor  series  approximation  in  the  neighbourhood  of  x: 

f  (x+h)  «=  f  (x) +hTVf  (x)  +  (hTHess  (x)  h) /2 
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DOCUMENTATION  OF  C-SOURCES 

1  Short  description  of  each  file 

The  name  of  each  C  file  is  an  abbreviation  of  its  function.  This  function  operates  on  data.  Data 
has  a  certain  format  and  a  certain  type. 

The  ASCII  format  represents  readable  numbers  (integers  or  floats).  Binary  format  represents 
bytes. 

Scalar  type  indicates  that  the  data  must  be  interpreted  as  single  entities  Vector  type  indicates 
that  the  data  forms  a  sequence  of  vectors.  Matrix  type  indicates  that  the  data  forms  just  one 
matrix.  To  use  the  BSB  network  for  garbled  string  recognition  also  String  type  can  be  chosen 

When  the  function  specifies  a  conversion,  it  converts  input  with  a  certain  format  and  type  to  output 
with  a  certain  format  and  type.  So  a  good  mnemonic  for  a  conversion  file  should  give  an  indication 
of  the  formats  and  types  it  expects  for  its  in-  and  output.  The  first  letter  of  characterizing  format 
and  type  is  a  good  choice  (A  ■  ascii,  B  =  binary,  S  =  scalar,  V  «  vector,  M  =  matrix,  S  =  string). 
Unfortunately  this  gives  rise  to  too  long  mnemonic  names  (at  least  4  letters).  Each  name  is  limited 
to  3  letters  with  the  addition  ot  the  full  name  it  actually  stands  for. 

Finally  for  each  file  is  given  whether  it  contains  a  main  program,  and  if  so  the  name  of  the 
executable  it  generates,  or  a  routine.  Common  parameters  for  all  programs  (such  as  the  number 
of  neurons)  are  read  from  one  file  BSB. PAR. 

2D:  main  for  2D,  creates  a  two  dimensional  plot  consisting  of  a  field  of  arrows,  optionally  it 

also  generates  a  memory  dump  (picture  file)  or  invokes  PRNT  to  generate  a  print  file 
2 Dpi:  routine  2Dpl,  does  the  real  plot  work 

3D.  main  for  3D,  creates  a  three  dimensional  plot,  z-axis  values  forming  the  height  of  a  x-y 
plane,  optionally  it  also  generates  a  memory  dump  (picture  file)  or  invokes  PRNT  to 
generate  a  print  file 

ACB:  routine  (short  for  ASCBS,  ASCII  scalar  convert  to  binary  scalar),  ASCII  convert  binary, 

conversion  of  formats,  noise  generation  is  optional 
ACM:  routine  (short  for  AMCBM,  ASCII  matrix  convert  to  binary  matrix),  ASCII  convert  matrix, 

conversion  ot  formats 

ACS:  routine  (short  for  ASCAS,  ASCII  scalar  convert  to  ASCII  string),  ASCII  convert  string, 

conversion  of  types 
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ACV:  routine  (short  lor  AVCBV,  ASCII  vector  convert  to  binary  vector),  ASCII  convert  vector, 

conversion  of  formats 

ADD:  main  for  ADD,  adds  two  picture  files  to  a  sum  file 

ALLOC:  routines  tmalloc  and  tcalloc,  tests  whether  space  can  be  allocated  and  if  so,  it  does,  else 
programs  stops 

BCA:  routine  (short  for  BSCAS,  short  for  binary  scalar  convert  to  ASCII  scalar),  binary  convert 

ASCII,  conversion  of  formats 

BCV:  routine  (short  for  BSCBV,  binary  scalar  convert  to  binary  vector),  binary  convert  vector, 

conversion  of  types,  either  according  to  thermometer  or  Gray  coding 
EIG:  routine,  calculates  eigenvalues  and  eigenvectors 

ENG:  routine,  calculates  energy 

GRPH:  routines  SetCursorPos  and  ReadCursorPos 

10:  routines  tfopen  and  tfopent,  test  whether  files  can  be  opened  and  if  so,  it  does,  else 

program  stops 

ISO:  routine,  calculates  isoclines,  optionally  it  also  creates  a  memory  dump  or  invokes  PRNT 

to  generate  a  print  file 

LRN:  routine,  learns  a  file  and  returns  a  weight  file 

MACB:  main  for  ACB 

MACM:  main  for  ACM 

MACS:  main  for  ACS 

MACV:  main  for  ACV 

MBCA:  main  for  BCA 

MBCV:  main  for  BCV 

MCA:  routine  (short  for  MBCMA,  matrix  binary  convert  to  matrix  ASCII),  matrix  convert  ASCII, 

conversion  of  formats 
MEIG:  main  for  EIG 
MENG:  main  for  ENG 
MISO:  main  for  ISO 

MJOb:  main  for  JOB,  a  batch  file  for  LRN  and  RCL 

MLRN:  main  for  LRN 

MMCA:  main  for  MCA 

MPGN:  main  for  PGN 

MRCL:  main  for  RCL 

MSCA:  main  for  SCA 

MVCA:  main  for  VCA 

MVCB:  main  for  VCB 


TNO  report 


Appendix  8 


Page 

B.3 


MVCW:  main  for  VCW 

MWCV:  main  for  WCV 

PGN.  routine,  find  prototypes 

PLT :  routine,  plots  a  picture  file 

PRNT :  routine,  translates  a  plot  to  a  print  file 

R2D:  read  parameters  for  2D 

R3D:  read  parameters  for  3D 

RACB:  read  parameters  for  ACB 

RACV:  read  parameters  for  ACV 

RBSB:  read  general  BSB  parameters 

RCA:  main  for  RCA,  (short  for  RBSCAS,  radar  binary  scalar  convert  to  ASCII  scalar),  radar 

convert  ASCII,  conversion  of  the  format  of  radar  file  (generated  by  LOCK-ON  simulator) 
to  ASCII  scalar  format 

RCL:  routine,  recalls  a  file  given  a  weigth  file 

RD:  read  common  parameters  for  2D,  3D  and  ISO 

REIG:  read  parameters  for  EIG 
RENG:  read  parameters  for  ENG 

RFCP.  read  parameter  RFCP  (fraction  connection  parameter,  1  is  fully  connected,  0  is  not 
connected) 

RISO:  read  parameters  for  ISO 
RJOB:  read  parameters  for  JOB 
RNRF:  read  parameter  NRF 

RSOU:  read  parameter  source  (whether  ASCII  or  scalar,  both  for  RCL  and  LRN) 

RUTL:  routine  search,  a  read  utility,  searches  in  parameter  file  for  a  keyword,  stops 
immediately  after  keyword  if  found,  else  program  stops 
RVC W:  read  parameter  for  VCW 
R WC V :  read  parameter  for  WCV 

SCA:  routine  (short  for  ASCAS,  ASCII  string  convert  to  ASCII  scalar),  string  convert  to  ASCII, 

conversion  of  types 
TEST:  main  for  TEST 
UTIL:  routines  normalize,  cmwv,  mdcon,  limit 

VCA:  routine  (short  for  VBCVA,  vector  binary  convert  to  vector  ASCII),  vector  convert  ASCII, 

conversion  of  formats 

VCB:  routine  (short  for  VBCSB,  vector  binary  convert  to  scalar  binary),  vector  convert  binary, 

conversion  of  types 
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VCW:  routine  (short  tor  VBCWB,  vector  binary  convert  to  weight  binary),  vector  convert  weight, 
generates  the  weight  matrix 

WCV:  routine  (short  for  WBCVB,  weight  binary  convert  to  vector  binary),  weight  convert  vector, 
given  a  weight  matrix,  it  generates  the  BSB  recall  vector 
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The  irn  program  invokes  the  following  programs,  depending  on  the  value  of  LrnSource 
(BSB.PAR).  When  LrnSource  =  string:  sea,  act,  bev,  vcw,  when  LrnSource  =  scalar:  acb, 
bcv.vcw,  when  LrnSource  -  vector:  acv,  vcw.  So  Im  transforms  an  input  file  with  a  certain  type  to 
a  binary  matrix  or  weight  file.  The  rcl  program  invokes  the  following  programs,  depending  on  the 
value  of  RcISource  (BSB.PAR).  When  RcISource  -  string:  sea,  acb,  bev,  wcv,  vcb,  bca,  acs,  when 
RclSource  -  scalar:  acb,  bev,  wcv,  vcb,  bca,  when  RcISource  -  vector  acv,  wcv,  vca.  So  rcl 
transforms  an  input  file  with  a  certain  type  via  a  weight  matrix  to  an  output  file  with  the  same  type 
as  the  input  file. 
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